HEBR: A High Efficiency Block Reporting Scheme for HDFS
نویسندگان
چکیده
Hadoop platform is widely being used for managing, analyzing and transforming large data sets in various systems. Two basic components of Hadoop are: 1) a distributed file system (HDFS) 2) a computation framework (MapReduce). HDFS stores data on simple commodity machines that run DataNode processes (DataNodes). A commodity machine running NameNode process (NameNode) maintains meta data information of the file system. Every DataNode sends lists of all files currently stored on it, known as block report to the NameNode periodically. NameNode processes block reports to build a mapping between files and their locations on various DataNodes. The block reports form a heavy internal load of a Hadoop cluster as they consume computation resources of DataNodes and the NameNode as well as network bandwidth of the cluster. With extensive supporting experiment results, this paper proposes a new block report protocol, HEBR, for Hadoop to significantly reduce both computational and communication overhead, and thus improving overall Hadoop system’s performance greatly.
منابع مشابه
A New HDFS Structure Model to Evaluate the Performance of Word Count Application on Different File Size
MapReduce is a powerful distributed processing model for large datasets. Hadoop is an open source framework and implementation of MapReduce. Hadoop distributed file system (HDFS) has become very popular to build large scale and high performance distributed data processing system. HDFS is designed mainly to handle big size files, so the processing of massive small files is a challenge in native ...
متن کاملAn Efficient Scheme for PAPR Reduction of OFDM based on Selected Mapping without Side Information
Orthogonal frequency division multiplexing (OFDM) has become a promising method for many wireless communication applications. However, one main drawback of OFDM systems is the high peak-to-average power ratio (PAPR). Selected mapping (SLM) is a well-known technique to decrease the problem of high PAPR in OFDM systems. In this method, transmitter is obliged to send some bits named side informati...
متن کاملXORing Elephants: Novel Erasure Codes for Big Data
Distributed storage systems for large clusters typically use replication to provide reliability. Recently, erasure codes have been used to reduce the large storage overhead of threereplicated systems. Reed-Solomon codes are the standard design choice and their high repair cost is often considered an unavoidable price to pay for high storage efficiency and high reliability. This paper shows how ...
متن کاملFull-dimensional multi configuration time dependent Hartree calculations of the ground and vibrationally excited states of He2,3Br2 clusters.
Quantum dynamics calculations are reported for the tetra-, and penta-atomic van der Waals He(N)Br(2) complexes using the multiconfiguration time-dependent Hartree (MCTDH) method. The computations are carried out in satellite coordinates, and the kinetic energy operator in this set of coordinates is given. A scheme for the representation of the potential energy surface based on the sum of the th...
متن کاملThe Dynamic Replication Mechanism of HDFS Hot File based on Cloud Storage
As an open source cloud storage scheme, HDFS is used by more and more large enterprises and researchers, and is actually applied to many cloud computing systems to deal with huge amounts of data. HDFS has many advantages, but there are some problems such as NameNode single point of failure, small file problem, hot issues, etc. For HDFS hot issues, this paper proposes a dynamic Replication mecha...
متن کامل